ProMiner: Organism-specific protein name detection using approximate string matching
نویسندگان
چکیده
were required, (2) disambiguation failed because of missing synonyms, e.g. ”vertebrate” and (3) for several cases the provided gold standard might be incorrect as considered abstracts describe findings in rat or human instead of mouse. Description Examples Unspecific synonym growth retarded, perinatal lethality, long lived Wrong context TGF-beta superfamily, c-myc tumors Unknown ambiguity high dose set at MTD or MFD Doubtful gold-standard interleukin-2 , H-2 locus, c-Jun Table 2: Sample of false positive matches in the best submitted mouse search run. Detected matches are marked . The first thirty false positive matches of the best submitted results for mouse (run number 3) have been analyzed in more detail. Undetected ambiguities are the dominant reasons for false positive matches (60%). This number includes unspecific synonyms which have neither been detected during curation nor marked as questionable synonyms, detection of synonyms in wrong contexts, and unknown external ambiguities. Detection of genes from other organisms accounts for 13.3 % of false matches. In eight cases, the reason for exclusion from the gold-standard remained unclear. Concluding, the ProMiner method provides high sensitivity using approximate matching and retains high specificity due to sensible incorporation of questionable synonyms.
منابع مشابه
A study on company name matching for database integration
In this report we describe an activity of information integration performed on databases with patent data and company indicators. Depending on the application area, this kind of activity is known as record linkage, duplicate detection, record matching, reference reconciliation or other domain-specific terms. In particular, we present a detailed case study on company name matching. We show how t...
متن کاملFinding Approximate Matches in Large Lexicons
Approximate string matching is used for spelling correction and personal name matching. In this paper we show how to use string matching techniques in conjunction with lexicon indexes to find approximate matches in a large lexicon. We test several lexicon indexing techniques, including n-grams and permuted lexicons, and several string matching techniques, including string similarity measures an...
متن کاملName Matching in Law Enforcement and Counter-Terrorism
Name matching is an important task in law enforcement and counter-terrorism. This paper briefly describes the nature of the name-matching task, enumerates typical name-matching applications, explains the technical challenges posed by namematching, and sets forth several important approaches to these technical challenges. 1. THE NAME-MATCHING TASK Name matching is the task of recognizing when tw...
متن کاملApproximate Multiple Pattern String Matching using Bit Parallelism: A Review
String matching is to find all the occurrences of a given pattern in a large text both being sequence of characters drawn from finite alphabet set. Approximate String Matching involves the detection of correct patterns along with the detection of some wrong patterns inside the text. Bit Parallelism is a feature that can be used to detect patterns inside the text and is reported to result in mor...
متن کاملApproximate String Matching for Geographic Names and Personal Names
The problem of matching strings allowing errors has recently gained importance, considering the increasing volume of online textual data. In geotechnologies, approximate string matching algorithms find many applications, such as gazetteers, address matching, and geographic information retrieval. This paper presents a novel method for approximate string matching, developed for the recognition of...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004